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We prove that the Tiden and Arnborg algorithm for equational unification modulo one-sided distribu- 
tivity is not polynomial time bounded as previously thought. A set of counterexamples is developed 
that demonstrates that the algorithm goes through exponentially many steps. 

1 Introduction 

Equational unification is central to automated deduction and its applications in areas such as symbolic 
protocol analysis. In particular, the unification problem for the theory AC ("Associativity-Commu- 
tativity") and its extensions ACI ("AC plus Idempotence") and ACUI ("AO with Unit element") have 
been studied in great detail in the past. Distributivity (of one binary operator over another) has received 
less attention comparatively. Some significant results have been obtained such as Schmidt-Schauss' 
breakthrough decidability result |7 ] for unification modulo the theory of two-sided distributivity 



Other works include (4l|3l. 

One of the earliest papers that considered a subproblem of this is by Tiden and Arnborg [8]. They 
present an algorithm for equational unification modulo a one-sided distributivity axiom: 



This unification problem has recently been of interest in cryptographic protocol analysis since many 
cryptographic operators satisfy this property: for instance, modular exponentiation (used in the RSA 
and El Gamal public key algorithms) distributes over modular multiplication. Indeed, many electronic 
election protocols rely on the property of "homomorphic encryption" where encryption distributes over 
some other operator. (A new algorithm for this unification problem, using a novel approach, is given 



Our goal in this paper is to analyze the Tiden-Arnborg algorithm. We prove that the algorithm is not 
polynomial time bounded as claimed in the Tiden-Arnborg paper. A set of counter examples is outlined 
that demonstrates that the present algorithm goes through exponentially many steps. 
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1.1 The Tiden-Arnborg Algorithm 

We present a very brief description of the algorithm of Tiden and Arnborg using deduction (inference) 
rules. First of all, it should be pointed out that what they consider is the elementary unification prob- 
lem 0, where the terms can only contain symbols in the signature of the theory and variables. (Thus free 
constants and free function symbols are not allowed.) Hence we can assume without loss of generality 
that the input is given as a set of equations where each equation is in one of the following forms: 

X= 7 Y, X= 7 Y + Z, and X = ? F x Z 

The key steps in the algorithm can be described by the following deduction rules: 

{u^v} a SB 

(a) - — 4— — r - — — if U occurs in S 'B 
{U=- V}U [V U ]{SB) 



(b) 



(c) 



(d) 



SB W {U ^ V xW, U ="> X xY} 
Mu{(/= ? yxff,F =■ X, W = ? Y} 

SB tt) {U^V + W, U=-X + Y} 
SB U {U =■ V + W, V= 7 X, W^Y} 

SB tt) {U = V xW, U=-X + Y) 



SB U {U =■ V x W, W = ? W x + W 2 , X = ? V x W v Y = ? V x W 2 } 



The W\ , W2 in rule (d) are fresh variables and tt) is disjoint union. Furthermore, rule (d) (the "splitting 
rule") is applied only when the other rules cannot be applied. A set of equations is said to be simple if 
and only if none of the rules (a), (b) and (c) can be applied to it. In other words, in a simple system, no 
variable can occur as the left-hand side in more than two equations. A sum transformation is defined as a 
binary relation between two simple systems Sj and S 2 , where S 2 is obtained from S { by applying rule (d), 
followed by repeated exhaustive applications of rules (a), (b) and (c). Clearly, a sum transformation is 
applicable if and only if some variable occurs as the left-hand side in more than one equation. 

Detection of failure is done using a kind of "extended occur-check" using two graph based data 
structures. We repeat the definitions of the graph structures and give a sketch of the algorithm presented 
in Tiden and Arnborg [ 8 ] for the convenience of the reader. 

Definition 1.1. The dependency graph of a simple system, E, is an edge colored, directed multi-graph. 
It has as vertices the variables of L. For an equation x = y + z in L it has an / + -colored edge (x, y) and 
an r + -colored edge (x,z). An equation x = y x z similarly generates two edges with colors l x and r x . 

Definition 1.2. The sum propagation graph of a simple system E is a directed simple graph. It has as 
vertices the equivalence classes of the symmetric, reflexive, and transitive closure of the relation defined 
by the r x -edges in the dependency graph of E. It has an edge (V, W) iff there is an edge in the dependency 
graph from a vertex in V, to a vertex in W with color l + or r + . 

The dependency graph structure is sufficient for finding all the occur-check like errors that may 
develop as the algorithm works with the system of equations. The propagation graph is needed to detect 
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non-unifiable systems that cause infinitely many applications of the splitting rule (d). An example of this 
type of system is the following two equations: 

Z= 1 V 2 + V 3 ,Z= 1 Vi x V 3 . 

These types of systems are shown not to have a unifier and as they will never produce a cycle in the 
dependency graph, the propagation graph is needed. 

Tiden and Arnborg give a polynomial time procedure for producing a simple system of equations 
form an initial set of equations. We sketch their unification algorithm from the starting point of an initial 
simple system. 



Algorithm 1 UNIFY Hi 
Require: Simple system E\. 
k:= 1 

while The sum transformation can be applied do 

If either the dependency or propagation graph contains a cycle, then stop with failure. 

Using the sum transformation compute Z^+i 

k:=k+l 
end while 

Compute the most general unifier (mgu) by back substitution. 



It is shown that if a system is not unifiable it will, after finitely many applications of the sum trans- 
formation, produce a cycle in one of the graphs. It is also shown that if a system is unifiable then the 
algorithm will produce the mgu. 

In the next section we present a family of unifiable systems that produce no cycles in either graph, 
but require exponentially many applications of the sum transformation. 

2 Counterexamples 

We present a family of unifiable simple systems on which the Tiden-Arnborg algorithm runs in exponen- 
tial time. For ease of exposition, we only use the letters T, x and y for variables, along with subscripts 
for x and y which are strings over the alphabet {1,2}. 

Definition 2.1. Let EQ be a subset of the simple system defined as follows: all multiplications are of 
the form x, = ? Tx yj (or yj = ■ T x xi) where T is a unique variable and all additions are of the form 

*; = ? *n +xa or y t = ? y n +y i2 . 

As the left variable of the multiplication operation will not effect the complexity result we use the 
unique variable T in this position. This makes the proof simpler. Thus the splitting rule (d) above can be 
viewed as 

SB l±J {Ui =-Tx Wj, Vi =■ V n + V i2 } 
U {Ui =-Tx Wj, Wj =■ W n + W j2 , U a = ? T x W jl , U i2 ='lx W j2 } 
where U,W G {x,y}. 

Specifically, we examine the complexity of unifying a set of equations from EQ. It will be shown that to 
achieve a unifier, the Tiden-Arnborg algorithm requires exponentially many steps. 
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Definition 2.2. For n > 0, let o(n) be the set of equations 



X \ i 


1 . 
— Xji+l ~rXp2, 








= ■ J X Xji 2 , 


X 


- ^xy, 


Xji+1 


— Xji+2 +-tji+l2 



for all < j < n. 

Thus a(0) is {x = ? xi +X2, y = ? y\ + y 2 ,xi = ? xn +X12, x = ? T x y, yi = ? T xi 2 }. 

Similarly a(2) is {x = ? xi +x 2 , y = ? yi +y 2 , xi = ? xn +x i2 , y 2 =■ yi\ +yn, *n = ? *m +xn 2 , y 22 = ? 
ym +y222,x m = ? xmi +xni2,x= ? T xy,yi = ? 7 xx 2 ,y 2! = ? rxxi 2 ,y 22 i = ? T xxn 2 }. 

Note that a(k+ 1) = cr(^) U {y 2 *+' =? 3 ; 2*+ 1 i +3 ; 2*+ 2 ) ^^'l = ' ^ xjci*+12)JCi*+2 =' x^+3 +Xjt+2 2 } for 
all k > 0. 



Definition 2.3. We denote a variable x, (or y,) as a /?ea& iff there are equations x; = ? x,i +x,- 2 and x; 
T x yj (or yi = ? y n +y i2 and y ; = ? T x x y ) 



We claim that a system of equations, as defined in Definition 2.2 will result in exponentially many 
applications of the sum transformation rule. 



3 Proof 

For a set of equations S, let m(S) denote the number of x symbols in it and p(S) denote the number of + 



symbols in it. Consider the sets of equations defined in Definition 2.2 By the analysis in [8] the number 
of sum transformations should be bounded by m(S) * p(S). We can see that according to Definition 2.2 
m(o(n)) = n + 2 and p{o{n)) = 2n + 3. Thus the upper bound should be 2n 2 + ln + 6. However, the 
actual bound for systems of equations o{n) will be shown to be 2" +3 — (n + 4). 



We can view the sets of equations defined in Definition 2.2 as tree-like graphs. Nodes correspond 
to variables. We first add a dummy root node with outdegree 2 whose children are the initial nodes x 
and y. The summation equations are represented by downward edges, from every parent node to its two 
children. We represent the multiplication equations as lateral edges, i.e., edges between nodes at the 
same level, i.e., distance from the root node. (Thus the graph is not really a tree if lateral edges are 
considered.). Because all left multiplication edges goto T and have no effect on the complexity of the 
algorithm in these systems of equations, we leave these edges out of the diagrams for clarity. Let G(n) 
be the graph of o(n) See Figure[T]for G(0). Note that the height of the tree is 3, i.e, there are 3 levels. 
In general, the graph of o{n) has n + 3 levels. We view the algorithm as proceeding down the tree, with 
sum transformations at a level completed before starting at the next level. We analyze the complexity of 
the Tiden-Arnborg algorithm in terms of transformations done on the graph as the algorithm proceeds. 
We show that if / is the height of the tree, then the number of sum transformations applied is 2 l — (I + 1). 

Observe that a variable is a peak if and only if its node has both downward and lateral edges. Figure[2] 
shows the effect of a sum transformation at a peak on the graph. Note that lateral edges are never deleted. 
Each application of the sum transformation increases the number of lateral edges by at most 2. 
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6- 6 



Figure 2: Sum transformation 



Note also that other than at the lowest level (depth n+ 3) the graph will have initially one multiplica- 
tion or exactly one edge between nodes at the same height in the graph. We can also see that the graph is 
partitioned between the left and right side or x and y side and that at level one, there is an edge from x to 
y. However, at all other lower levels, the initial edge between nodes of the same level goes from y to x. 

We can also see that given the graph as described above, each time the sum transformation is applied, 
the peak moves from either the x side of the graph to the y side or from the y side to the x side, and 



the new peak was not previously a peak. To see this, take any system as defined by Definition 2.2 



and examine the graph of that system. Initially all edges from nodes at the same height only go from 



one side to the other. In this limited formulation of Definition 2.2 these same level edges are the only 
multiplication functions. This ensures that any time a sum transformation is performed on some equation, 
Xi = ? T x yj or y\ = ? Tx xj, by definition the new edges created by the sum transformation must go from 
either x to y or y to x because there are no multiplication equations of the form Xi = ? T x xj. The fact 



that a new peak was not previosly a peak follows from Definition 2.2 and the definition of the sum 
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transformation. Since we assume a simple system of equations, there are no two distinct equations of the 
form x =■ Xj +xj, x = ? Xk +%i (i-e-, with the same variable on the left-hand side): likewise for y. Once 
when the sum transformation is applied to equations Xj = ? T x yj and Xj = ? xn -\-xu, the downward edges 
from Xj to both xn and xq. are removed (see Figure [3]). Thus Xj can never become a peak again. 




Figure 3: Graph after one application of the sum transformation at node x, creating a new peak at node 
x\ and new edges from x\ to y\ and X2 to y2 

Lemma 3.1. Sum transformations at level k create a peak at x [k at level k + 1 provided k + 1 is not the 
lowest level. 



Proof. This follows inductively from the form of the graph in Definition 2.2 First from the definition of 
the set of equations, the first peak is located at x. After the first sum transformation a peak is created at x\ . 
Assume this propagates to level n. Then there is a peak at x v , to which the sum transformation is applied, 
adding an equation of the form x v ,+i = ? T x y v ,+i , which creates a peak at the next level, provided there 
is already an equation X\i+\ = ? x x m +x l i+i 2 . □ 



Lemma 3.2. There are no lateral edges from nodes corresponding to variables of the form y 2j for j > 0. 
In other words, no equations of the form y 2j = ? T x x k are generated. 



Proof. This follows inductively from Definition 2.2 In any initial system there is no edge from any y 2 k 



node at level k. By the definition of sum transformation an outgoing lateral edge from y 2 k can be created 
only if the parent node, y 2 k-i has an outgoing lateral edge. □ 

We can also notice a fact about the order in which the nodes become peaks via the sum transforma- 
tion. The order is a right-to-left lexicographical order of the digits of the nodes' indices (i.e., subscripts). 
For example, for level 4 the sequence is xni —tym — > *2ii - > )>2ii - > x m — > ym —> *22i — > yn\ — > 
xin — > ym — > x 2 \ 2 — > y 2 \ 2 — > x\ 22 — > y\ 22 — > x 222 y 222 . Note, 3^222 is not necessarily a peak but is 
added to illustrate the path. Based on this observation we have the following lemma: 

Lemma 3.3. At any level of the tree, if xi =Tx yj is an equation (i.e., if there is a lateral edge from 
Xj to yj) then i = j. Similarly, if yi = ? T x x; is an equation, then j = revlex(i) where revlex is the 
lexicographic successor of the index of the node, yu but starting with the I st bit ( i.e., from right to left). 
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Proof. This follows inductively from Definition |2.2| and the sum transformation. The base cases are 



x = ? T x y (level 1) and y\ = ! T xx% (level 2). Assume this property for level k. Now we will show that 
all the equations introduced at level k + 1 by sum transformation at level k will satisfy the property. If 
yi = ■ T x x^u;) is an equation at level k and yt =• yn +y,2 is an equation (i.e., y, is a peak at level k), 
then applying the sum transformation results in yn = ? Tx * r ev/ex(;)i an( ^ 3^2 = ? T x i^/jM, Now note 
that rev/ex(il) = rev/ex(i)l and revlex(il) = revlex(i)2 since / is not a string of 2's. Note also that if k 
is not the lowest level, then there will already be an equation y 2 *i = ? Tx X\k 2 at level k+l but this does 
not violate the property in the lemma since revlex{2 k \) = 1*2. 

If Xi = ? T x y; and xi = ? x,i +x,2, then by application of the sum transformation x,i = ? Tx y (1 and = 
T x y l2 and the result follows. □ 



Lemma 3.4. If there is a path of lateral edges from node U[ to node v, in the graph at some point where 
node Ui is a peak, then every node on the path, except possibly v/, will become a peak at some point. 



Proof. Straightforward, by induction on the length of the path. □ 
Lemma 3.5. At every level k < n + 3 a path of lateral edges between X\k-i to y 2 k-i w created. 



Proof. For brevity, we refer to such paths as RL paths. Clearly there (already) is an RL path at level 1. 
We show that if a RL path exists at level k and k + l <n + 3, then a RL path will be created at level k + l. 



By Lemma 3. 1 there will be a peak at x^-i and by Lemma 3.4 every node other than y 2 k-i will become 
a peak. This creates, at level k+l, edges of the form xn =' T x yn, x a = ? T x y i2 , yn = ? T x x rev i ex (i)\ 
and yi2 = ? Tx x w /«(i)2 f° r every i / 2 k ~ l . Since the edge y 2 *-'i =1 T x x X k-\ 2 is already there to begin 
with, we get the RL path at level k+l. □ 



Lemma 3.6. At each level k <n + 3 of the graph, the sum transformation can be applied 2 k — I times. 



Proof. Follows from Lemma|3.5| At each level k, 2 k nodes will be created eventually. An RL path can 



be created and thus the sum transformation must be applied to all nodes except y 2 k-\ resulting in 2 — 1 
applications at each level k. □ 

Theorem 3.7. For a graph of height n, 2 n+l —n — 2 sum transformations are used. 

Proof. This easily follows from Lemmas |3.lj - B^and the fact that £f =0 (2 ! ' - l) = 2 n+l - n - 2. □ 

We see that the current algorithm fails to achive polynomial complexity for at least a subset of 
possible unification problems. Further counter examples may be found that also cause this exponential 
growth with the sum transformation. This naturally results in the question of whether a polynomial time 
algorithm can be found, either by a modification of the current algorithm or by a new approach. 
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4 An Illustrated Example 



In this section we give an example of the process on a system of equations defined as in Definition 2.2 
We begin with <r(0), i.e., the following set of initial equations: 



x 
x 

y 
y\ 



--■ Txy, 

= ? X\+X 2 , 

= ? Txx 2 , 

= ? *11+*12 



This can be represented by a graph as shown in Figure[4] Note that the first peak is located at node x. The 




■VI 1 



-VI 2 



Figure 4: Graph for a(0) 



first peak, x, is selected and the sum transformation can be applied, resulting in the removal of equation 
x = ? x\ +X2 from the set of equations and the addition of the two equations x\ = ? T x y\ and x 2 = ? T x y 2 . 
The direction of the new edges are from x to y due to the fact that the multiplication equation from the 
peak, to which the sum transformation was applied was also from x to y. After the sum transformation is 
applied x is no longer a peak because of the removal of x = ? x\ +x 2 , but now the node x\ is a peak due 
to the addition of x\ = ? T x y\ (see Lemma 3.6 1. This new graph is shown in Figure[5] 
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Figure 5: After one application of the sum transformation 



We see that only one application of the sum transformation can be applied at level 1 . At the next level 
we continue the process begining with the new peak x\. The result of applying the sum transformation 
on x\ is the removal of x\ =• xu +*n from the set of equations and the addition of two new edges, 
x\\ =■ T x and x\ 2 = ? T x y l2 , to the set of equations. The two new y nodes are also created adding 
yi = ? yn +yn to the set of equations. The result is that x\ is no longer a peak but now y\ is (see 
Lemma [33] ) . The resulting graph can be seen in Figure[6] Also, now that y\ is the peak the direction of 



the multiplication path has switched to the direction of y\ to x 2 (see Lemma 3.1 1. 




*11 - v 12 



Figure 6: After two applications of the sum transformation 



We can now continue the process, applying the sum transformation to the peak at y\. This will 
remove y\ = ? y\\ + J12 fom the set of equations and add x 2 = ? #21 +x 22 to the set of equations, creating a 
peak at node x 2 and removing the peak at node y\. Lastly, a third sum transformation is applied to node 
x 2 , removing x 2 = ? #21 +x 22 from the set of equations and adding y 2 = ? y 2 \ +y 22 to the set of equations. 
Note that because there is no multiplication path from y 2 to some x node y 2 is not a peak and no more 
sum transformations can be applied at the current level. Because there are also no additional nodes at the 
next level we stop with a total of 4 applications of the sum transformation. The final graph is shown in 
Figure [7] 
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o ; - <C>' t> ~''^" ; t> 0'» 0» 




Figure 7: After 4 applications of the sum transformation 

5 Conclusions 

We have shown that the Tiden-Arnborg algorithm does not run in polynomial time as claimed in [0. It 
is also not hard to see that the algorithm produces exponentially large mgus for the set of systems o(n). 
However, it may still be that the unifiability problem, i.e., whether a unifier exists modulo this theory, is 
in P. We are currently working on this and related problems. 



References 

[1] Siva Anantharaman, Hai Lin, Christopher Lynch, Paliath Narendran & Michael Rusinowitch (2010): Cap 
unification: application to protocol security modulo homomorphic encryption. In: Dengguo Feng, David A. 
Basin & Peng Liu, editors: ASIACCS, ACM, pp. 192-203. Available at |http : //doi . acm . org/10 .11457] 
11755688.17557131 

[2] Franz Baader & Wayne Snyder (2001): Unification Theory. In: John Alan Robinson & Andrei Voronkov, 
editors: Handbook of Automated Reasoning, Elsevier and MIT Press, pp. 445-532. 

[3] Evelyne Contejean (1993): A Partial Solution for D-Unification Based on a Reduction to AC 1 -Unification. In: 
Andrzej Lingas, Rolf G. Karlsson & Svante Carlsson, editors: ICALP, Lecture Notes in Computer Science 
700, Springer, pp. 621-632. Available at |http : //dx . doi . org/10 . 1007/3-540-56939- 1_107| 

[4] Evelyne Contejean (1993): Solving ^-Problems Modulo Distributivity by a Reduction to AC 1 -Unification. J. 
Symb. Comput. 16(5), pp. 493-521. 

[5] Jean-Pierre Jouannaud & Claude Kirchner (1991): Solving Equations in Abstract Algebras: A Rule-Based 
Survey of Unification. In: Computational Logic - Essays in Honor of Alan Robinson, pp. 257-321. 

[6] Hai Lin (2009): Algorithms for Cryptographic Protocol Verification in Presence of Algebraic Properties. Ph.D. 
thesis, Clarkson University. 

[7] Manfred Schmidt-SchauB (1998): A Decision Algorithm for Distributive Unification. Theor. Comput. Sci. 
208(1-2), pp. 111-148. Available at |http : //dx . doi . org/10 . 1016/S0304-3975 (98) 00081-4| 

[8] Erik Tiden & Stefan Arnborg (1987): Unification Problems with One-Sided Distributivity. J. Symb. Comput. 
3(1/2), pp. 183-202. 



